Rotated Canonical Correlation Analysis for Multilingual Corpora

نویسندگان

  • Simona Balbi
  • Michelangelo Misuraca
  • SIMONA BALBI
  • MICHELANGELO MISURACA
چکیده

This paper aims at proposing the joint use of Canonical Correlation Analysis and Procrustes Rotations (RCA), when we deal with a text and its translation into another language. The basic idea is representing words in the two different natural languages on a common reference space. The main characteristic of this space is to be language independent, although Procrustes Rotation is performed transforming the lexical table derived from translation by minimizing its distance from the lexical table belonging to the original corpus, while the subsequent Canonical Correlation Analysis treats symmetrically the two word sets. The most interesting RCA feature is building a unique reference space for representing the correlation structure in the data, inducing the two systems of canonical factors to lie on the same space. These graphical representations enables us to read distances between corresponding points in terms of different way of translating the same word in relation with the general context defined by the canonical variates. Trying to understand the distances between matched points could represent an useful tool for enriching lexical resources in a translation procedure. In this paper we propose the comparison of the most frequent content bearing words in the two languages, analyzing one year (2003) of Le Monde Diplomatique and its Italian edition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Vector Space Word Representations Using Multilingual Correlation

The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique ...

متن کامل

Deep Multilingual Correlation for Improved Word Embeddings

Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings from two languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of ...

متن کامل

Multivariate Characterisation of Oulmes-Zaer and Tidili Cattle Using the Morphological Traits

Fourteen different morphological traits in 169 and 131 cattle of Oulmes-Zaer and Tidili, respectively were recorded and analyzed using a multivariate approach. The characters measured included heart girth, wither height, rump height, rump length, rump width, chest depth, body length, neck length, cannon circumference, ear length, ear width, head length, horn length and tail length. Breed signif...

متن کامل

Canonical Correlation Analysis for Determination of Relationship between Morphological and Physiological Pollinated Characteristics in Five Varieties of Phalaenopsis

Phalaenopsis is an important genus of orchids that is grown for economical production of cut flower and potted plants. The objective of this study is the evaluation of correlation between morphological and physiological traits of self and cross-pollination of 5 varieties of Phalaenopsis orchid. Some morphological traits were measured: Capsule length (CL), capsule volume (CV), weight of seeds in...

متن کامل

Canonical Analysis of the Relationship between Components of Professional Ethics and Dimensions of ‎Social Responsibility‌ ‌

  Background: Today, professional ethics and social responsibility play an important role in ‎organizations. This study aimed canonical analysis of the relationship between components ‎of professional ethics and social responsibility dimensions among the first high ‎school teachers in the Naghadeh province.‎‏ ‏ Method: This study, in terms of purpose is application, and in terms of data ‎collec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006